If for any reason whatsoever the graphs do not show on the github pages webpage, please look ath full_doc.ipynb on the gitub infovis on branch gh_pages.
Diabetes and obesity are growing problems in today's society. Unhealthy eating habits and inconsistent diets are creating an overall decline in people's general health. These behaviors can be traced back to multiple causes. The goal of this data research is to investigate the complex interplay between these causes. Using two complete and extensive datasets, we will examine the different correlations that can be made between the consumers’ diet, health and multiple outer factors, like income levels and food prices. Further numbers we will analyze include, for example, physical activity levels, addictions and education.
The ultimate goal of this research is to shed light on the nature and reasoning behind people's choices regarding diet and health, in order to perhaps develop strategies to improve the public's overall health.
The main dataset that we will be using for our project is called 'Diabetes Health Indicators Dataset', derived from the Behavioural Risk Factor Surveillance System (BRFSS) which is a yearly survey regarding health, published by the CDC. Our dataset is from the year 2015. It consists of answers from 253,680 individuals, including risk behaviors, health issues, eating / drug habits, etc. The dataset is from a zip file containing three datasets total, but our focus will be on “diabetes_binary_health_indicators_BRFSS2015”, which categorizes the presence of diabetes into two simple categories; those with diabetes, and those without. These numbers, in combination with the different variables from the survey responses regarding health and diet, will aid us in finding the answers of our questions perfectly.
The secondary dataset used in this research, 'Food Prices in Turkey', consists of the average prices of different kinds of whole foods in Turkey throughout the years. Although its data is not from the United States of America as with the main database, the data is clean and complete, and can still be preprocessed to fit our research as best as possible. For instance, the data from this dataset that does not originate from the year 2015 can be filtered out, so that the remaining data is from the same year as in our main dataset.
The data from our newly preprocessed dataset now only includes data from 2015, just like our main dataset. Furthermore, designating the different foods within the dataset to their respective food categories results in the following list of categories:
Grains & Potatoes (Rice, Wheat flour, Pasta, Bulgur, Bread (common), Bread (pita), Potatoes) --> carbohydrates
Legumes (Beans (white), Lentils, Chickpeas, Peas (green, dry)) --> carbohydrates / proteins
Fruits & Vegetables (Apples (red), Bananas, Oranges, Tomatoes, Garlic, Onions, Cabbage, Cauliflower, Cucumbers (greenhouse), Spinach, Eggplants) --> vitamines / fibers
Meats (Meat (chicken), Meat (mutton), Meat (veal), Fish (fresh)) --> proteins
Dairy (Milk (pasteurized), Yogurt, Cheese) --> fats / proteins
Sugar --> carbohydrates
One of the most prominent causes of nutritional inequality is income inequality. Through examination of incomes and food prices, our aim is to expose and display the cause-effect relationship between economic factors and the differential access to various food categories. By comparing certain statistics on food prices with the recommended consumption of each food category, our hope is to uncover economic factors that contribute to divergent dietary patterns. We will also have a glance at education levels and correlations with health. The overall theme of this section is outside factors that consumers do not have a direct influence on. Income and education are often predetermined, yet, as will be evident, can have significant influence on health. Our results might lead to potential conclusions that could help with finding solutions and interventions to prominent health risks in today's society.
As mentioned above, the whole foods from our dataset 'Food Prices in Turkey' were categorized into certain food groups: grains & potatoes, legumes, fruits & vegetables, meats, dairy and sugar. According to the WHO, a healthy diet should consist of different amounts of calories from all categories. Grain & potatoes are a great source of carbohydrates, for example, whilst meats and dairy are important sources of protein. Obesity and diabetes are often caused by overconsumption of carbohydrates. Putting the average food prices in 2015 together with recommended food consumption can be visualized as such:
The most notable statistic from this visualization is the high cost of meat, which is a main source of protein for many. Other sources of proteins and healthy fats such as dairy and legumes are also more expensive than most sources of carbohydrates. Carbohydrates like pasta, potatoes and direct sugars are often eaten in abundance by people who are considered unhealthy. When viewing the recommended nutrition of all food categories, it is significant how carbohydrates and especially sugars should be eaten with some sort of moderation. Prices of vegetables and fruits are fortunately relatively low, but as mentioned before, this data regards whole foods only. Particularly in the USA, processed foods tend to become more expensive with a more 'healthy image', whereas junk food prices remain the same.
For more clarity on the significance and in support of our argument, we decided to display an additional graph displaying the price changes of food.
This is in support of our argument, because obesity has gradually been increasing in the USA, according to research by the CDC. As previously discussed, the more expensive food types (meats & dairy) are those that should be eaten more, and those that are already eaten in abundance are cheaper. The graph above shows that the prices of carbs and sugar do not significantly change, even though these are the foods that should be eaten with more moderation. In contrast, protein-rich foods and even fruits and vegetables are gradually becoming more expensive, and therefore less available.
The impact of economics on dietary habits can also directly be derived from comparing the two fields. This way, all foods are taken into consideration, so also processed foods such as junk food and premade meals. Using the main dataset 'Diabetes Health Indicators Dataset', a comparison between the subjects' health and their share of incomes can be pleasantly visualized.
For analysis of the participants' BMI's in this dataset, they can be aggregated into 4 different categories. These categories were established to conform to the statistics of the WHO as follows:
Using these specifications, our dataset makes for the following graph:
This graph shows that over 2/3 of citizens in the USA are overweight, of which half suffer from obesity. Besides the fact that this graph proves our point of the importance of the subject we are discussing, it also gives us a clear view of the distribution of BMI in the USA.
We can now consider any correlations between BMI and average income. It is important to note that this dataset defines a scale of eight different annual income groups, where 1 = less than $10,000, 5 = less than $35,000 and 8 = $75,000 or more. If we put these numbers together, we get the following graph:
There is a surprisingly linear correlation between the average BMI of American citizens and their average incomes. The higher the income, the lower the BMI. This supports the theory that people with a higher income are more likely to purchase healthier foods. It is evident that, as poverty is more apparent, the availability of healthier foods decreases. It would seem that poor dietary decisions are not a matter of choice, but rather a matter of income. Better diets might be promoted by the lowering of the prices of healthier foods. The overconsumption of junk food and other unhealthy (processed) foods can be discouraged by perhaps applying a tax on these foods.
The first indirect cause of unhealthy nutrition behavior that can be analyzed is education. It is important to note that, as with income, our main dataset defines a scale of 6 different education levels, where 1 = Never attended school or only kindergarten 2 = Grades 1 through 8, etc. When visualizing the public's health and education levels, we get the following graph:
There are two interpretations of this graph. Again, there is an obvious correlation between the two fields; the higher the education level, the lower the average BMI. This could mean that people with a higher level of education are more aware of the choices they make regarding their diet. Higher education would lead to smarter choices and therefore a healthier diet.
A second interpretation is that higher education leads to better jobs. People who are better educated, have better possibilities in the job market, and are therefore able to make more money. This interpretation would be in support of our previous argument regarding the effect of income inequality on dietary habits.
Within our second perspective, we focus on the role of personal responsibility in shaping healthier dietary choices. For this perspective, other data regarding more personal choices and secondary factors can be used, such as physical activity, drug usage and food choices. With this information, we might find correlations between everyday habits and overall health outcomes. We will analyze these correlations and visualize them, which could help in finding effective methods of improving society’s overall health with not just nutrition policies, but perhaps also through other, second-hand means that may be overlooked at first glance.
We recognize the significance of secondary factors that influence dietary habits. By analyzing data regarding personal choices and habits, we may find a different perspective on influences on health. One aspect we explore is the subject of addictions like smoking and drinking, and if they have any correlations with bad dietary habits. By observing these correlations, underlying questions can be answered. Do individuals who engage in addictive behaviors neglect healthy eating habits more often? Or do these unhealthy habits not tell us anything about the likelihood of unhealthy diets?
To visualize eating habits compared to addictions, we decided to categorize the survey's participants from our dataset into two groups; 'healthy eaters' and 'unhealthy eaters'. These groups were defined by whether they eat fruits and vegetables every day, which is a statistic also listed in our dataset. Healthy eaters are those for whom both are true, and unhealthy eaters are those for whom neither are true.
Using this classification, a visualization is possible for the potential correlations between eating behavior and addictions.
From these diagrams it is apparent that smoking is a habit that occurs more often amongst people that also eat unhealthily. This is an interesting connection that shows there is a certain psychological and habitual factor regarding diet. Those more susceptible to the bad habit of smoking are more likely to also form bad habits when it comes to food. This correlation is harder to find for alcohol consumption, perhaps because our dataset contains very few participants that drink heavily, or because there is no such connection as with smoking.
In order to bring a clear view of all the correlations between bad habits and a multiple health indicators (BMI, high blood pressure and high cholesterol), the following graph displays all of them: